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This paper gives a general method for deriving limiting distribu- 
tions of complete case statistics for missing data models from corre- 
sponding results for the model where all data are observed. This pro- 
vides a convenient tool for obtaining the asymptotic behavior of com- 
plete case versions of established full data methods without lengthy 
proofs. 

The methodology is illustrated by analyzing three inference pro- 
cedures for partially linear regression models with responses missing 
at random. We first show that complete case versions of asymptot- 
ically efficient estimators of the slope parameter for the full model 
are efficient, thereby solving the problem of constructing efficient es- 
timators of the slope parameter for this model. Second, we derive an 
asymptotically distribution free test for fitting a normal distribution 
to the errors. Finally, we obtain an asymptotically distribution free 
test for linearity, that is, for testing that the nonparametric compo- 
nent of these models is a constant. This test is new both when data 
are fully observed and when data are missing at random. 

1. Introduction. The basis for regression is a response variable Y and a 
covariate vector X which are linked via the formula Y = r{X) + e, where 
r is a regression function and e is an error variable. The analysis is then 
carried out based on independent copies {Xi,Yi), . . . ,{Xn,Yn) of the pair 
{X,Y). We refer to this as the full model. In applications, however, re- 
sponses may be missing. The base observation is then a triple {X,6Y,5), 
where 5 is an indicator variable with E[5] = P{6 = 1) > 0. The interpreta- 
tion is that for 6 = 1, one observes the pair {X,Y), while for (5 = 0, one only 
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observes the covariate X. The analysis is now based on independent copies 
(Xi, Jill, (5i), . . . , (5„1^, of the observation {X^5Y,5). An accepted 
way of analyzing such data is by imputing the missing responses. Here we 
take a closer look at complete case analysis. This method ignores the in- 
complete observations and works with only the N = X^"=i completely 
observed pairs (Xjj , y^j), . . . , [Xi^ , 1^^). Formally, to each statistic 

Tn = tn{Xi , Yi, . . . , Xn, Yn) 

for the full model there corresponds the complete case statistic 
Tc = t]\f(^Xi-^ , Yi-^ , • • • , -^iM 1 )' 

which mimics the statistic Tn by treating (Xj^, yj J, . . . , (Xj^, Kj^) as if it 
were a sample of size N from the original setting without missing data. 

Our main result gives a simple and useful method for obtaining the asymp- 
totic distribution of Tc- We show that the limiting distribution of Tc coincides 
with that of Tn = tn{Xi,Yi, Xn,Yn) where {Xi,Yi), y„) form a 
random sample drawn from the conditional distribution of {X,Y), given 
6 = 1; see Remark 2.4. This can be used as follows. One typically knows the 
limiting distribution £((5) of Tn under each joint distribution Q of X and 
Y belonging to some model. If the distribution Q of {X,Y) belongs to this 
model, then the limiting distribution of the complete case statistic is 2{Q). 
We refer to this as the transfer principle. It provides a convenient tool for 
obtaining the asymptotic behavior of complete case versions of established 
full data methods without (reproducing) lengthy proofs. 

Of special interest are statistics T„ that are asymptotically linear for a 
functional T from a class Q of distributions into M™' in the sense that if 
X and Y have joint distribution Q belonging to the model Q, then the 
expansion 

1 " 

Tn = T{Q) + -Y^i;Q{Xj,Yj)+op{n~'/^) 
" j=i 

holds. Here ipq is a measurable function into such that E[iPq{X,Y)] = 
and £^[||'i/'Q(-'^, y)|P] is finite when X and Y have joint distribution Q. 
Here and below || • || denotes the Euclidean norm. The function -i/'g is com- 
monly called an influence function. From the above expansion we obtain 
that n^/^(T„ — T{Q)) is asymptotically normal with the zero vector as mean 
and with dispersion matrix = E[%l)Q{X,Y)iliQ{X,Y)]. If Q belongs to 

the model Q, then we have the expansion 

1 " 

Tn = T{Q) + -^i;Q{X„Y,) + op(n-i/2), 
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and obtain from our main result that 

1 

Te = T{Q) + -Y1 ^iM^J^yj) + op(n-^/'), 
i=i 

see Remark 2.5. From this we immediately derive the expansion 

1 " 
j=i 

Thus, if Q belongs to the model Q and T{Q) equals T{Q), then Tc is asymp- 
totically linear in the model with missing data with influence function ip, 
where 

^l,{X,6Y,6) = ^^^PQ{X,Y). 

We refer to this as the transfer principle for asymptotically linear statistics. 
It yields that n^^^{Tc — T{Q)) is asymptotically normal with the zero vector 
as mean and with dispersion matrix {1 / E[5])T,{Q) . 

The key to a successful application of the transfer principle is the con- 
dition T{Q) = T{Q). Under this condition, n^/^-consistency carries over to 
the complete case statistic. If this condition is not met, the complete case 
statistic will be biased for estimating T{Q). 

For our illustration of the transfer principle we consider the important 
case where the response Y is missing at random (MAR). This means that 
the indicator 5 is conditionally independent of y, given X, that is, 

P{5=1\X,Y) = P{5 = 1\X)=-k{X) a.s. 

This is a common assumption and reasonable in many applications [see Little 
and Rubin (2002), Chapter 1]. This model is referred to as the MAR model. 

It is well known that the complete case analysis does not always perform 
well and that an approach which imputes missing values often has better 
statistical properties. See, for example. Chapter 3 of Little and Rubin (2002) 
for examples where using the complete case approach results in bias or a 
loss of precision, due to the loss of information. For a discussion of various 
imputing methods we again refer to Little and Rubin (2002), and also to 
Miiller, Schick and Wefelmeyer (2006), who propose efficient estimators for 
various regression settings which impute missing and non-missing responses. 

Although complete case analysis can lead to the above-mentioned prob- 
lems, there are situations where it provides useful and optimal inference 
procedures. Efromovich (2011), for example, considers nonparametric re- 
gression with responses missing at random. He shows that his complete case 
estimator of the regression function is optimal in the sense that it satisfies an 
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asymptotic sharp minimax property. Miiller (2009) demonstrates efficiency 
of a complete case estimator for the parameter vector in the nonhnear re- 
gression model. 

For simplicity and clarity, we illustrate the above transfer principle using 
the partially linear regression model. In this model the response Y is linked 
to covariates U and V via the relation 

(1.1) Y = ^'^U + p{V) + e, 

with I? an unknown m-dimensional vector and p an unknown twice contin- 
uously differentiable function. The error e is assumed to have mean zero, 
finite variance cr^ and a density /, and is independent of the covariates 
(U,V), where the random vector U has dimension m and the random vari- 
able V takes values in the compact interval [0, 1] . Throughout this paper, 
we impose the following conditions on the joint distribution G oiU and V: 

(Gl) The covariate V has a density that is bounded and bounded away 
from zero on [0, 1]. 

(G2) The covariate vector U satisfies < oo and the matrix 

Wg = E[{U - pg{V)){U - ^lG{V))'] 

is positive definite, where ^g(^) = ^[f^l^]- 

The requirement involving Wg is needed to identify the parameter 
One important problem is the efficient estimation of the regression pa- 
rameter in (1.1). This is addressed in our first illustration of the transfer 
principle below. The crucial condition for a successful application of the 
transfer principle, T{Q) = T{Q), is satisfied in this case and, more generally, 
also for functionals of the triple {'Q,p,f). The MAR assumption and the 
independence of e and {U^V) imply that e and ([/, y,5) are independent. 
Hence, the regression parameters i? and p and the error density / stay the 
same when conditioning on 5 = 1 . Only the covariate distribution G changes 
to G, the conditional distribution of {U^V) given 5 = 1. This argument sug- 
gests that inference about the triple (i9,/9, /) should be carried out using a 
complete case analysis, because the complete case observations are sufficient 
for {'(},p,f,G) since they carry all the information about these parameters. 
The covariate pair ([/, V) alone, on the other hand, has no information on 
{'d,p,f), and hence has no bearing on the inference about these parameters 
when the response Y is missing at random. The same reasoning also applies 
to general semiparametric regression models: inference about the regression 
function and the error distribution should be based on the complete cases 
only. 

In order to obtain an efficient estimator for we must assume that the 
error density / has finite Fisher information for location. This means that / 
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is absolutely continuous with a.e. derivative /' such that Jf = f ^'j{x)f{x) dx 
is finite, where ^/ = —f'/f is the score function for location. Efficient esti- 
mators of 1? in the full model are characterized by the stochastic expansion 

1 

K = ^ + - Y,^JfWG)-\Uj - fiG{Vj))£f{e,) + op(n-i/2); 

see, for example, Schick (1993). Because of the structure of the MAR model 
introduced above, the transfer principle for asymptotically linear statistics 
yields that the complete case version of an efficient estimator satisfies the 
expansion 




This of course requires that G satisfies the properties (Gl) and (G2). This 
is the case when vr is bounded away from zero; see Remark 3.1. Here n{X) = 
7r{U,V) = P{5 = 1\U,V). 

Although several estimators exist which are efficient in the full partially 
linear model, to our knowledge no efficient estimators have so far been con- 
structed for the corresponding MAR model. We show in Section 3 that the 
expansion (1.2) of t?c characterizes asymptotically efficient estimators of 
"!? in the MAR model. This means that complete case versions of efficient 
estimators in the full model remain efficient in the MAR model (under ap- 
propriate conditions). This result in turn solves the important problem of 
constructing efficient estimators for i!) in the partially linear MAR model. 
For constructions of efficient estimators in the full model (1.1), we refer the 
reader to Cuzick (1992), Schick (1993, 1996), Bhattacharya and Zhao (1997) 
and Forrester et al. (2003). Some of these constructions require smoothness 
assumptions on fic- Then the validity of (1.2) requires the same smoothness 
assumptions on 

The above method of constructing efficient estimators for the finite-dimen- 
sional parameter also yields efficient estimators in other semiparametric re- 
gression MAR models. The influence function of the complete case version 
of an estimator efficient for the full model is given by the transfer princi- 
ple for asymptotically linear estimators. One then only needs to show that 
this influence function is the efficient influence function for the MAR model. 
The latter can be done by mimicking the results in Section 3. There we 
sketch this approach for the partially linear model with additive p and for 
a single index model. Miiller (2009) has calculated the efficient influence 
function for the regression parameter in a nonlinear regression model. Using 
the transfer principle, one sees that the efficient influence function equals 
the influence function of the complete case version of an efficient estimator 
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for the full model. This provides a simple derivation of efficient estimators 
in her model. 

We believe that the above efficiency transfer is valid for the estimation 
of other characteristics in the MAR model (1.1). We expect that the effi- 
ciency transfer generalizes to the estimation of (smooth) functionals of the 
triple ("i?,/?, /). This includes as important special cases the estimation of the 
error distribution function, the error variance and other characteristics of / 
such as quantiles and moments of /. However, further research is needed to 
crystallize the issues involved. 

Next we illustrate the transfer principle on goodness-of-fit and lack-of-fit 
tests. There is a vast literature on goodness-of-fit tests for fitting an error 
distribution and lack-of-fit tests for fitting a regression function in fully 
observable regression models. See, for example. Hart (1997) and the review 
article by Koul (2006), and the references therein. Here we shall discuss 
two important examples for the MAR regression models. One pertains to 
fitting a parametric distribution to the error distribution in (1.1) and the 
other to testing whether p in the model (1.1) is a constant or not. In both 
examples the proposed tests are complete case analogs of full model tests 
that are asymptotically distribution free, that is, the limiting distribution of 
the test statistic under the null hypothesis is the same for all members of the 
null model being fitted. Due to the transfer principle, the same conclusion 
continues to hold for the proposed tests for the MAR model (1.1). 

First, consider the goodness-of-fit testing problem in the model (1.1) and 
the null hypothesis Hq : e ~ A^(0, o"^), for some unknown < < oo. For the 
full model a residual-based test of this hypothesis was introduced by Miiller, 
Schick and Wefelmeyer (2012) (MSW) adapting a martingale transform test 
of Khmaladze and Koul (2009) for fitting a parametric family of error distri- 
butions in nonparametric regression. In (1.1), the residuals are of the form 
ij =Yj — "d^Uj — p{Vj), where i? is a -y/n-consistent estimator of "d and p is 
a nonparametric estimator of p, such as a local smoother based on the co- 
variates Vj and the modified responses Yj — 'd^Uj, or a series estimator. Let 
a = (X]j=i ^^Z'^)^^^ denote the estimator of the standard deviation a and 
Zj = fj/cj, j = 1, . . . , n, denote the standardized residuals. The test statistic 
of MSW is then 



Tn = sup 



n 



Y,{^[z,<t\-H{z,^t)h{z,)] 



for some known functions h and H related to the standard normal distri- 
bution function and its derivatives; see Section 4, equation (4.3). Here we 
work with a series estimator of which is discussed in Section 4 of MSW. 
This requires no additional assumptions. The test based on T„ is asymptot- 
ically distribution free, because under the null hypothesis, T„ converges in 
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distribution to 



(1.3) 



C= sup \B{t)\ 



0<t<l 



where i? is a standard Brownian motion. Due to the transfer principle, the 
complete case version Tc of the above Tn has the same limiting distribution 
under the null hypothesis. Hence, the null hypothesis is rejected if Tc exceeds 
the upper a quantile of the distribution of (. See Section 4, equation (4.4), 
and the discussion around it for a detailed description of the complete case 
variant Tc of the above T„. From the discussion on optimality of this test in 
Khmaladze and Koul (2009) and the transfer principle, it follows that the 
test based on Tc will generally be more powerful than the complete case test 
based on the Kolmogorov-Smirnov statistic. 

Finally, we consider testing whether p is constant within the partially 
linear model, that is, we suppose that the partially linear model (1.1) holds 
true and test whether the regression function is in fact linear. Here we adapt 
an approach by Stute, Xu and Zhu (2008) for testing a general parametric 
model in nonparametric regression, which is based on a weighted residual- 
based empirical process. For the full model this suggests a test statistic of 
the form 



where ijo are the residuals under the null hypothesis obtained by regressing 
the responses Yj on the covariates Uj including an intercept, and where Wj 
are normalized versions of the residuals obtained from regressing xV^j) on 
the covariates Uj including an intercept, for a suitably chosen function x- 
The asymptotic null distribution of this test is that of 



where Bq denotes a standard Brownian bridge. This is the first test for this 
problem in the case of fully observed data. The transfer principle immedi- 
ately shows that the complete case variant of this test described at (5.1) has 
the same limiting distribution. 

The literature on lack-of-fit testing in the regression model when responses 
are missing at random is scant. Sun and Wang (2009) establish asymptotic 
distributional properties of some tests based on marked residual empirical 
processes for fitting a parametric model to the regression function when 
data are imputed using the inverse probability method. Sun, Wang and Dai 
(2009) derive tests to check the hypothesis that the partially linear model 
(1.1) is appropriate, based on data which are "completed" by imputing es- 
timators for the responses. These tests are compared with tests that ignore 




Co = sup |-Bo(t)| 



0<t<l 
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the missing data pairs. Gonzalez-Manteiga and Perez-Gonzalez (2006) use 
imputation to complete the data. They derive tests about linearity of the 
regression function in a general nonparametric regression model. Their test 
is similar to the above test for the last example. The last two papers re- 
port simulation results that support the superiority of these methods over 
a selected complete case method. However, one can verify that the first test 
statistic in Sun, Wang and Dai (2009) is asymptotically equivalent to a com- 
plete case statistic in their case 3, and this complete case statistic should 
thus result in an equivalent test. Finally, Li (2012) uses imputation together 
with the minimum distance methodology of Koul and Ni (2004) to propose 
tests for fitting a class of parametric models to the regression function that 
includes polynomials. 

This article is organized as follows. Section 2 provides the theory for the 
transfer principle. The key is Lemma 2.1, which calculates the explicit form 
of the distribution of a complete case statistic. In Section 3 we show that 
the influence function of the complete case version of an efficient estimator 
of in the full data partially linear model is the efficient influence function 
for estimating in the MAR model. Similar results are sketched for a par- 
tially linear additive model (see Remark 3.1) and a single index model (see 
Remark 3.3). Section 4 discusses the test for normality of the errors for the 
MAR model, and derives expansions for the complete case residual-based 
empirical distribution function. In Section 5 we provide details for the com- 
plete case version of the second test about the nonparametric part p in (1.1) 
being constant. 

2. Distribution theory for general complete case statistics. In this sec- 
tion we derive the exact distribution of a complete case statistic in a general 
setting. Let {X,^/) be a measurable space, and, for each integer k, let tk 
be a measurable function from X'^ into M"^. Let ((5i,^i), {62,^2), ... be inde- 
pendent copies of ((5,^), where 6 is Bernoulli with parameter p> and ^ is 
a ^-valued random variable. We denote the conditional distribution of ^, 
given 6 = 1 hy Q. Let ^i,,^2, • ■ • be independent Af- valued random variables 
with common distribution Q. Denote the distribution of in(^i) • • • > ^n) by 
Rn- Then, for any Borel set B, 

Rn{B) = Q"(t„ G 5) = P(t„(|i, . . . ,|„) e B) 

= P(t„(ei,...,en,)GS|5i = l,...,<5„ = l). 

By a complete case statistic associated with the sequence {tn) we mean a 
statistic Tc^n of the form 

Ac{l,...,n} ^i&A J N^A ^ 



THE TRANSFER PRINCIPLE 9 

where io(C'^) is a constant, |^| denotes the cardinahty of A and is the 
vector (^j^, . . . with ii < ■ ■ ■ < the elements of the non-empty subset 
Ac {1, ... ,n}. Note that the product [JliGA '^«][nj^yi(l ~ ^«)] is the indica- 
tor function of the event {Si = l,i £ A} Ci {5i = 0,i ^ A} . Thus, Tc^n equals 
t\A\{^^) on this event. It is now clear that Tc^„ depends on the indicators 
6i, . . . ,6n and only those observations for which 5i = l. 

Remark 2.1. For a measurable function defined on A^, we define the 
sequence {^n) by ipn{xi, . ■ . ,Xn) = {i^ixi) + • • • + tp{xn))/n. The complete 
case statistic associated with (V'n) is X^j=i^iV'('?i)/X]j=i^i- 

Remark 2.2. If Tc^n is a complete case statistic associated with 
and a is a real number, then (X^j=i ^jO^T'c.n is a complete case statistic 
associated with the sequence 

For the remainder of this section Tc^n denotes a complete case statistic 
associated with and Hn its distribution. The next lemma calculates i7„ 
explicitly. 

Lemma 2.1. For every Borel subset B ofW^, we have 

H^B) = P{T,,n <^B) = j2[l) pHl-pr~'RkiB), 

k=o ^ ^ 

with Ro{B) = l[to{^'')e B]. 

Proof. Conditioning on 5i, . . . , (5„ yields the identity 

and, thus, 

Ac{l,...,n} 

where 

H{A, B) = P{T,^n eB\6^ = l,ieA,6j=0,j^A) 

= (e^) G B\5, = l,ieA, 6j = 0, A) 

= Q\^\{t\A\^B) = R\A\{B) 

for non-empty A, while H{0,B) = Rq{B). The desired result is now imme- 
diate. □ 

Remark 2.3. Lemma 2.1 has the following interpretation. The statistic 
Tc^n has the same distribution as txiS,!, . . . ,Ck), where K is a binomial 
random variable with parameters n and p, independent of .^1,^2 1 
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From the lemma we immediately obtain the following results. 

Corollary 2.1. The following statements hold: 

(a) If the sequence (Rn) is tight, so is the sequence (Hn). 

(b) If Rn converges weakly to some limit L, then Hn converges weakly to 
the same limit L. 

(c) // Rn converges weakly to point mass at 0, then Tc^n converges in 
probability to zero. 

Remark 2.4. Recall that Rn is the distribution of tn{ii, ■ ■ ■ ,^n)- Thus, 
by (b), the limiting distribution of T^^n equals the limiting distribution of 
tni^i, ■ ■ ■ ,Cn)- This provides the basis for the transfer principle. 

Remark 2.5. Let ip and ipn be as in Remark 2.1. Set N = ^J^i^j. 
Then 

is a complete case statistic associated with s„ = {n^^'^{tn — ^n))- Suppose 
that 

Then, by (c), we have 

Sc^n = Viv ^r,,„ - ^ Z ^Mii)^ = op{i) 

and, consequently, 

1 " 
i=i 

This is the basis for the transfer principle for asymptotically linear statistics. 

3. Efficiency considerations for the partially linear MAR model. Here 
we shall show that the expansion (1.2) characterizes efficient estimators in 
the partially linear MAR model. For this we only need to show that the 
influence function appearing in (1.2) is the efficient influence function for 
estimating in this model. We formulate this as the main result of this 
section; see Lemma 3.1. By the discussion in the Introduction, we must 
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require that the conditional distribution G of {U,V), given 5=1, satisfies 
the assumptions (Gl) and (G2). This is crucial for the transfer principle to 
apply, and holds if the function vr is bounded away from zero, as we shall 
show first. 

Remark 3.1. Consider the conditional distribution G of {U^V) given 
5 = 1. Then G satisfies the properties (Gl) and (G2) if vr is bounded away 
from zero: it is easy to check that G has density vf with respect to G, where 
■7r(C/, V) = vr(C/, If vf > r/ for some positive constant r;, then 

vj\h\dG< j \h\dG< J \h\dG/E[5] 

for all h£ Li (G) and, therefore, 

a^W(ja = j \aJ {u — ^q{v))\^ dG{u^v) >ri j \aJ {u — i1q{v))\^ dG{u,v) 

>7] j\a^{u- iig{v))\^ dG{u,v) = 7]a^WGa for all a eM'". 

From these inequalities we conclude that G inherits the properties (Gl) and 
(G2) from G if vr is bounded away from zero. 

Lemma 3.1. Suppose the model (1.1) holds with p being twice contin- 
uously differentiable and error density having finite Fisher information for 
location. Also assume vr is bounded away from zero. Then an efficient es- 
timator of the parameter i? in the MAR model is characterized by (1-2). 
As a consequence, the complete case version of an efficient estimator of the 
parameter •& in the full model is efficient for the MAR model. 

Proof. We rely heavily on the calculations in Miiller, Schick and We- 
felmeyer (2006). The authors considered the general missing data problem 
with base observation (X, 6) where X and Y do not have to follow a 
regression model. They expressed the joint distribution P of (X, 5) via 

P[dx, dy, dz) = G{dx)B^(^^) {dz)izQ{x, dy) + (1 - z) Ao(c?y)) 

in terms of the distribution G of X, the conditional probability 'k{x) oi 5 = 1 
given X = x^ and the conditional distribution Q{x,dy) of Y given X = x. 
Here Bp denotes the Bernoulli distribution with parameter p and the 
Dirac measure at t. They showed that the tangent space is the sum of the 
orthogonal spaces 

Ti = {'u(X):nG^}, T2 = {5v{X,Y) -.v e Y}, 

T3 = {{6 - tt{X))w{X) -.weW}. 
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Here, the set ^ consists of all real- valued functions u satisfying J" udG = 0, 
J dG < oo and for which there is a sequence Gnu of distributions fulfilling 
the model assumptions on G and 

j{n^'\dGl!^'-dG'l')-\udG^''"j 

The set W consists of real- valued functions w with the property J" 'uP'Ti{l — 
vr) dG < oo for which there is a sequence iTnw satisfying the model assump- 
tions on vr such that 

Finally, the set Y consists of functions v with the properties J v{x, y)Q{x, dy) 
= for all X and J v'^{x,y)G{dx)Q{x,dy) < oo, and for which there is a 
sequence Qnv satisfying the model assumptions on Q and 

n^/\dQii\x, •) - dQ'/\x, •)) - ^vix, •) dQ^/\x, •) j G{dx) ^ 0. 

In the partially linear regression model (1.1) we have X = (U, V) and 

Q{x, dy) = Q^,pj{u, V, dy) = f{y - 'd^ u - p{v)) dy, 

where the density / has finite Fisher information for location, -d belongs to 
M"* and p is a smooth function. For this model Y consists of the functions 

Ulf{e) + b{V)lf{e) + c{e) 

with a G M™, EllP'iV)] < oo and c G L2{F) with / c{y)dF{y) = and 
/ c{y)y dF{y) = 0. Since we are interested in estimating the finite-dimensional 
parameter ■!?, we introduce the functional 

'^(G',Qtf,pj,7r) =?9. 

Now consider 

g{X,5Y,5)=5{U-iii{V))tf{e) 

with fii{V) = E{U\V,5 = 1). Then the coordinates of g{X,5Y,5) belong to 
r. Thus, we have E[g{X,5Y,5)u{X)] = and E[g{X,5Y,5){5 -t:{X))w{X)] 
= 0. Note that e and {5,X) are independent, and that we have i?[£j(e)] = 
and i?[£y.(e)] = J/. Using this and the definition of ^i, we calculate 

E[g{X, 5Y, 5)5{a^Ulf{e) + b{V)£f{e) + c{e))] 

= E[6{U - fii{V)){U''a + b{V))]Jf + E[6{U - fii{V))]E[if{e)c{e)] 
= E[5iU - fii{V))iU - fiiiV))^]aJf. 
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From this we can conclude that the functional k is differentiable with canon- 
ical gradient g^, {X, 5Y, 6) of the form 

5{JfE[5{U - fii{V)){U - f,^{V)f])-\u - fii{V))ef{e) 

This canonical gradient is the influence function of an efficient estimator 
of -3. Now use the fact that fJ-iiV) equals Hq{V) and E[{U — fii{V)){U — 
fii{V))~^\6 = 1] equals Wq to see that this is indeed the characterization 
(1.2). □ 

Remark 3.2. The above efficiency result extends in a straightforward 
manner to the case when V is higher dimensional. It also extends to the 
partially linear additive model 

Y = ^^U + pi{Vi) + p2{V2) + e, 

where (Vi, V2) takes values in the unit square [0, 1]^ and has a density that 
is bounded and bounded away from zero on the unit square. Let G now 
denote the joint distribution of {U,Vi,V2). Assume that the matrix E[{U — 
f^G{Vi,V2)){U - hg{Vi,V2))V , with hg{Vi,V2) = E{U\{Vi,V2)), is positive 
definite, and that tt is bounded away from zero. In the present model the 
space 'fi consists of functions of the form 

a'^Uifie) + (6i(Vi) + b2{V2))if{e) + c(e), 

where E[bl{Vi) + bl{V2)] is finite. The role of g is now played by 

g{X,SY,5)=S{U-h{Vi)-i>2{V2))lf(.e), 

where i^iiVi) + D2{V2) minimizes E[\\U - Bi{Vi) - -82(^2)1^1^ = 1] with re- 
spect to functions Bi and B2 from [0,1] into M"' such that £;[||Bi(Vi)|p] 
and -62(^2)11^] are finite. The efficient influence function is 

^^{JjE[{U - D,{V,) - D2iV2))iU - D^iV,) - D2iV2)f\6 = l])'' 

X {U-Di{Vi)-i>2{V2))if{e). 

By the transfer principle, this is the influence function of a complete case 
version of an estimator with influence function 

{JfE[{U - UiiVi) - U2{V2)){U - Ui{Vi) - U2{V2)V]r' 

X {U-u^{Vi)-iy2{V2))ef{e) 

in the full model, where z^i(Vi) + z^2(V2) minimizes ^[||[/- Si (14) -S2(V'2)|P] 
over functions Bi and B2 as above. Schick (1996) constructed estimators 
in the full model that have the latter influence function. In particular, he 
established their efficiency by showing that the above influence function is 
indeed the efficient influence function in the full model. 
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Remark 3.3. In the above we have shown that for the partiahy linear 
MAR model (with a possibly additive smooth function) an efficient estimator 
of the parameter i? can be obtained as the complete case version of an 
efficient estimator in the full model. This is typically also true for other 
more general semiparametric regression models and can be verified along 
the above lines. We sketch this for the following single index model. 

In this model Y = p{V + 'd^U) +e with one-dimensional V, m-dimensional 
U, i!) G M"* and twice continuously differentiable p. Assume again that vr is 
bounded away from zero. The space Y for this model consists of functions 

a^Up'{V + i?^C/)^/(e) + b{V + t?^C/)^/(e) + c(e) 

with a G R"^, Elb"^ (V + U)] < oo and c as before. For this we must require 
that ElWUW^ip'iV + '&~^U))^] is finite. Now one works with g{X,5Y,6) = 
5{U-i^i{V + ^'^U))p'{V + i)^U)if{e) and iyi{V + ^^U) = E{U\V + ^^U,5 = 
1), and obtains the canonical gradient 

g.{X, 6Y, 6) = ^^{JfWiy\U - + ^^U))p'iV + ^'^U)£fie) 

if Wi = E[{U - vi{V + S^U)){U - vi{V + S^U)y{p'{V + {f^U)f\5 = 1] 
is invertible. By the transfer principle, this is the infiuence function of a 
complete case version of an estimator with infiuence function {JfW)~^{U — 
v{V + if^U))p'{V + i9^U)if{e), where iy{V + t9^U) = E[U\V + i)'^V] and 
W = E[{U - v{V + [/))([/ - v{V + if^U))^]{p'{V + if^U)f]. The latter 
influence function is the efficient gradient for the full model. Indeed, it is the 
canonical gradient for the case when 5 = 1 almost surely. 

4. Testing for normal errors. In this section we shall introduce a test 
for normal errors which uses the Khmaladze transform of the empirical dis- 
tribution function F(t) = n~^^^^^ l[ej <t\, t G M, based on residuals ij. 
Goodness-of-fit tests for the full model based on that transform were dis- 
cussed in Khmaladze and Koul (2004, 2009) for parametric and nonpara- 
metric regression, and by MSW for the partially linear regression model 
considered here. Due to the transfer principle, it is now straightforward to 
adapt the approach by MSW to the MAR model, which is what we will do 
here for a simple illustration of the method. Note that MSW consider the 
more complex case where F is a covariate vector. 

First, we briefly sketch the approach for the full model. To avoid additional 
assumptions, we estimate "d and p using a least squares approach with the 
trigonometric basis. This is discussed in MSW, Section 4, for an additive 

regression function, that is, with p{xi, . . . ,Xq) = pi{xi) -\ \-pq{xq). (Here 

we have q = l.) For A; = 1, 2, ... , we set 

(j)l:{x) = cos (nkx), < a; < 1. 
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Our estimator of the regression function r{u,v) = -d^ u + p{v) is then 



K 



f{u, v) = {Tu + ^ Mk{v), 



k=0 



where 4>o{x) = 1 cind (i?^, /3o, . . . , $k) minimizes 

n / K \ 2 



k=0 



with respect to a,bQ, . . . ,bK ■ For j = 1, . . . ,n the error is estimated by the 
residual 

K 
k=0 

We also need the normalized residuals Zj = ij/a, where a is the square root 
of (l/n)E-=i^1- 

Assume for the remainder of this section that / has finite Fisher informa- 
tion for location and finite fourth moment. This assumption is met by the 
normal density. It then follows from MSW, Theorem 4.1 and Remark 4.2, 
that with K = Kn ~ n~^/^ we have the uniform stochastic expansions 



(4.1) 
and 

(4.2) sup 



sup 



^^(l[e,<t]-l[e,-<t]-/(t)e,) 
,=1 



op(l) 



^f-1 



op(l) 



where /* denotes the density of the normalized errors Zj = ej/a. 

Write (f) for the standard normal density. In terms of the density /^,, the 
null hypothesis is 

Ho:f* = (p. 
MSW proposed the test statistic 



sup 



1 " 

-^^il[Z,<t]-H{tAZ,)hiZ, 



)) 



with 



(4.3) 



h{x) = (1, X, — 1) , 



r(x) 



h{z)h~ {z)4>{z) dz, 



Hit) 



h^ix)r-^{x)(l){x) dx. 
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This is a version of the martingale transform test of Khmaladze and Koul 
(2009) for fitting an error distribution in nonparametric regression. MSW 
showed that under the null hypothesis the test statistic T„ converges in 
distribution to ("> which is the supremum of a standard Brownian motion 
given in (1.3). This holds for every distribution function G satisfying (Gl) 
and (G2). Since e and (5, U, V) are independent under the MAR assumption, 
the conditional distribution of {£,U,V), given 5 = 1, is given hy F x G, 
where G is the conditional distribution of {U,V), given 5=1. Thus, if G 
satisfies (Gl) and (G2), then the transfer principle applies and yields the 
same limiting distribution for the complete case version Tc of r„, where 



(4.4) 



: sup 



Here Zjc are the complete case versions of the normalized residuals and 
are defined by ijc/o'c with ijc = Yj — 'dJUj — J2^=o ^ki^kiVj) and ac the 
square root of A^^^ X^j=i "^j^jc while {-dj ,$0, . . . , $Kn) are the least squares 
estimators minimizing 

n / Km \ 2 



k=0 



The transfer principle for asymptotically linear statistics also provides 
complete case versions of the expansions (4.1) and (4.2) from above. The 
first expansion becomes 



1 " 

^Y.^,mc<t]-l[e,<t]-fit)e,) 



sup 



and the second expansion becomes 
1 

sup 



Op(l), 



Op(l). 



5. Testing for linearity. In this section we address testing whether the 
function p in the partially linear MAR model is constant. In the previous 
section we demonstrated how the transfer principle can be used to adapt 
a known test for the full model to the MAR model. We now show how to 
develop a test procedure for the MAR model when no counterpart to the 
full model exists. Our approach is to first develop a procedure for the full 
model, and then to apply the transfer principle. Our test statistic is inspired 
by that in Stute, Xu and Zhu (2008). 
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Under the null hypothesis the partially linear model reduces to the linear 
regression model Y = a + {f^U + £, where a is an unknown constant, that 
is, we have 

Hq : p[v) = a for all f G M and some a G IR. 

To simplify notation, we introduce j3 = {a,'d'^)~^ and Z = (1, U^)~^ . Then we 
can write the model under the null hypothesis as y = /3^Z + e. 

It follows from (G2) that the dispersion matrix Aq of U is positive definite. 
From this we immediately see that the matrix 

1 E[U^] 



Mg = E[ZZ ' ] 



E[U] E[UU 



is also positive definite. Thus, the least squares estimator /3 of /? = {a,'d'^)~^ 
is root-n consistent under the null hypothesis, as it satisfies 



/3 = /3 + M^^i5]z,e,+op(n- 



l/2^ 



Now let X denote a continuous non-constant function on [0, 1]. Introduce 
the least squares estimator 7 for regressing the responses xi^j) on the design 
vectors Zj, so that 7 minimizes 



1 " 



Set R, = x{V,) - j^Z,, W, = Rj/{n~^ YTj=iR]Y'^ and e,o = - f^'' Z,, 
j = 1, . . . ,n. Our test statistic in the full model is 



Ln = sup 



1 " 



As in Stute, Xu and Zhu (2008), we have the following result. 

Lemma 5.1. Suppose the null hypothesis holds and f is uniformly con- 
tinuous. Then Tn converges in distribution to Co = supQ<j<]^ |i?o(i)|; where 
Bq denotes a standard Brownian bridge. 

Proof. Set 

XG{X)=xiV)-7^Z, 

where 7g minimizes E[{x{V) - Zf\. Let pc = K'^'' E[x{V){U - E[U])]. 
Then it is easy to check that 

Xg{X) = x{V) - E[x{V)] - pI(U - E[U]) 

= X{V) - E[x{V)] - pI{pg{V) - E[U]) - pI{U - ficiV)). 
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Note also that x being non-constant on [0, 1] and V having a positive density 
on [0, 1] imphes x(^) has a positive variance. These facts together with Wq 
being positive definite guarantee that E[x^{X)\ = Var(x(T^) — p^ficiV)) + 
PqWgPg is positive. 

Next, let g he a measurable function such that E[g'^{X)] is finite and 
assume / is uniformly continuous. Then Theorem 2.2.4 of Koul (2002) yields 



sup 



1 " 

-Y,9iX,)mo<t]-l[e,<t])-fit)E[g{X)Z^]0-l3) 



From this fact we obtain 



sup 



1 " 

- RAMijO <t]- l[ej < t]) - f{t)DCP - P) 



where 

b = E[x{V)Z'^] - j'^E[ZZ^] = E[xg{V)Z^] + op{l). 
In view of the identities E[xg{V)Z'^] = and ^^1=1 Rj = 0, 

we can conclude 



sup 



^ n 1 " 



n 

j=l 3=1 



■ op{n-^'^). 



Writing Rj — xg(^) = —{l — 1g) Zj, we derive the expansions 



sup 



1 " 

-Y^{R^-xG{vm[^3<i]-m) 



op{n 



- E(^^- - ^^(y^)? < ii7 - - E 11^. II' = 



i=i 



and therefore obtain n ^Yl'^=i^^j — ^Ix^iV)] + op(l). The above deriva- 
tions in turn yield 



sup 



^ n ^ n 



op(l), 



with xh = XG/E[xG{y)V^'^- Since, again by Theorem 2.2.4 of Koul (2002), 
the process 



1 " 



t]-Fit)), 



-oo < t < oo, 



converges in D([— oo,oo]) to a time-changed Brownian bridge Bq{F), we 
conclude that T„ has the desired limiting distribution. □ 
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The complete case version of T„ is given by 

\ 1/2 



(5.1) rc = sup 



1 " / / 1 " 



with ijc = Yj - ^JZj, Rjc = xiVj) - iJZj, 



= arg min 6j {Yj — Zj ) ^ and 



n 

% = argmin^5j(x(V,) - j'^Zjf. 

7 j=i 

By the transfer principle, the limiting distribution of Tc under the null hy- 
pothesis will be that of Co from the above lemma, as long as / is uniformly 
continuous and G satisfies (Gl) and (G2). 

Remark 5.1. The above is easily extended to cover testing for other 
parametric forms for p. For example, we can test whether p is linear, p{v) = 
a + bv. In this case we proceed as above, but with the role of Z now played 
by the vector [1,U~^ ,V) and with x chosen to be nonlinear. 
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